42 research outputs found

    Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge

    Full text link
    With the demand for autonomous control and personalized speech generation, the style control and transfer in Text-to-Speech (TTS) is becoming more and more important. In this paper, we propose a new TTS system that can perform style transfer with interpretability and high fidelity. Firstly, we design a TTS system that combines variational autoencoder (VAE) and diffusion refiner to get refined mel-spectrograms. Specifically, a two-stage and a one-stage system are designed respectively, to improve the audio quality and the performance of style transfer. Secondly, a diffusion bridge of quantized VAE is designed to efficiently learn complex discrete style representations and improve the performance of style transfer. To have a better ability of style transfer, we introduce ControlVAE to improve the reconstruction quality and have good interpretability simultaneously. Experiments on LibriTTS dataset demonstrate that our method is more effective than baseline models.Comment: Accepted at Interspeech202

    A Chunk-Based Reordering Model for Phrase-Based SMT Systems

    Get PDF
    This paper proposed a novel reordering model based on the reordering of source language chunks. This model is used as a preprocessing step of phrase-based translation models and could be well integrated with them. At the same time, as a chunk-based model, syntax information could be concerned in the process of reordering while the entire parsing of the source sentence is not required. Two experiments were carried out and the results showed that the proposed model could improve the performance of a phrase-based statistical machine translation (SMT) system greatly

    Translation memory sharing models in XMCAT

    Get PDF
    In this paper, two Translation Memory (TM) sharing models adopted in XMCAT, a Computer Assisted Translation tool (CAT) supporting cooperated work in machine translation, was described in detail. One is Center-based TM sharing model, which is only fit for users in a local area network (LAN) and the other is a novel model called P2P-based TM sharing model, which could be used through Internet by geographically distributed users. With the two TM sharing models, a user may share data with other users through network, so that he/she may reduce the repeated work further,and cooperate with others more easily. Besides, the methods used in XMCAT to deal with the problem of multi-translations arose in the cooperated memory sharing models, were also proposed in this paper. XMCAT system has been adopted and approved by some translation companies
    corecore